Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster content resolve #38074

Merged
merged 8 commits into from
Nov 14, 2017
Merged

faster content resolve #38074

merged 8 commits into from
Nov 14, 2017

Conversation

jrieken
Copy link
Member

@jrieken jrieken commented Nov 10, 2017

This is part of #37541 and reduces the waterfall by a few steps. It does that by reading the file only once and setting up the decoder-stream on the fly and by avoid a block stat-call whenever possible

There are two big changes:

  1. The new resolveStringStream method which by itself just stream contents. It handles encoding, binary files and bails out on very large file. Tho the latter should/can be managed from the caller. That is second part of this change...
  2. which is in resolveTextContent. It builds an IRawTextContent object, ideally in parallel and only waterfall'ish when needed (etag compare). It also checks the size of the file and it will cancel a read-stream when a file is too large. Usually only a very small chunk of text has been loaded/wasted by that. I believe that the better comprise then also blocking on the stat-call.

This change makes the editor be the winner with normal sized files

nov-10-2017 17-41-24

@jrieken jrieken added this to the November 2017 milestone Nov 10, 2017
@microsoft microsoft deleted a comment from SeniorUI Nov 13, 2017
@microsoft microsoft deleted a comment from SeniorUI Nov 13, 2017
@microsoft microsoft deleted a comment from SeniorUI Nov 13, 2017
@microsoft microsoft deleted a comment from SeniorUI Nov 13, 2017
@microsoft microsoft deleted a comment from SeniorUI Nov 13, 2017
@microsoft microsoft deleted a comment from SeniorUI Nov 13, 2017
@microsoft microsoft deleted a comment from SeniorUI Nov 13, 2017
@microsoft microsoft deleted a comment from SeniorUI Nov 13, 2017
@microsoft microsoft deleted a comment from SeniorUI Nov 13, 2017
@microsoft microsoft deleted a comment from SeniorUI Nov 13, 2017
@microsoft microsoft deleted a comment from SeniorUI Nov 13, 2017
@microsoft microsoft deleted a comment from SeniorUI Nov 13, 2017
@microsoft microsoft deleted a comment from SeniorUI Nov 13, 2017
Copy link
Member

@bpasero bpasero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrieken added some feedback to the code. the stream reading and handling seems correct to me.

Overall I feel that more of the code in textFileService should move into fileService because otherwise the method in fileService is not very useful for any other consumer.

public resolveStringStream(resource: uri, options?: IResolveContentOptions, token: CancellationToken = CancellationToken.None): IStringStream {
const ret = new EventEmitter();

fs.open(resource.path, 'r', (err, fd) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep semantics with how other methods deal with resources, use const absolutePath = this.toAbsolutePath(resource); to pass on to fs.open.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check


fs.open(resource.path, 'r', (err, fd) => {

if (err) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the provided resource is a directory (EISDIR), we should keep throwing the error we have for this case:

return TPromise.wrapError<IStreamContent>(new FileOperationError(
	nls.localize('fileIsDirectoryError', "File is directory"),
	FileOperationResult.FILE_IS_DIRECTORY
));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrieken I just noticed that there is also the FILE_NOT_FOUND error that we return from here. That should also be handled by checking for ENOENT.

let contentToken = new CancellationTokenSource();
let completePromise: Thenable<any>;

let statsPromise = this.fileService.resolveFile(resource).then(stat => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this has to move into the resolveStringStream method because otherwise the FILE_NOT_MODIFIED_SINCE error will never be thrown for anyone that is using this method outside of the textFileService. If no etag is provided, the stat can run in parallel to the fs.open in resolveStringStream and otherwise we have to run it first.

The stat could then also be used within resolveStringStream to cancel reading the file in chunks if the file size is detected to be too large.

return TPromise.wrap(completePromise).then(() => result);
}

private fillInContents(content: IRawTextContent, resource: URI, options: IResolveContentOptions, token: CancellationToken): Thenable<any> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for not doing all of this from within the file service? We used to have a method resolveStreamContent that was used for the editor model loading that contains both stream and metadata and I think we can preserve all of that also with your solution. Imho the code here should be pushed down and we can just keep resolveStreamContent with your faster implementation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for not doing all of this from within the file service?

No-one uses the file service like that, only the text file service... My understanding is that every other caller of resolveContent and resolveContentStream cares only about the actual data and the encoding. So, why make everyone pay for this? I'd prefer to see the file service as something more raw which the text file service uses to compose something more complex.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have a good argument other than a) making it harder to test this change if parts of it is spread between fileService and textFileService and b) making it harder to understand how the method behaves when called from the editor. I do think that something like the NOT_MODIFIED_SINCE is something that a remote file system couldn't implement as well (e.g. that was actually taken from the HTTP 304 Not Modified as inspiration).

@@ -623,6 +713,28 @@ export class FileService implements IFileService {
});
}

// TODO@ben, I extracted this from `resolveStreamContent` but I am not sure
// what it does in comparsion to the other encoding related functions...
public getEncoding2(resource: uri, options: IResolveContentOptions, detected: IMimeAndEncoding): string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a way to prioritise all the different encodings that can be determined for a file (either by detecting the encoding, by settings or by user choice). There are some rules which encoding to pick in which case. I think the method can be private, not public and be renamed to something like selectEncoding or pickEncoding or getPreferredEncoding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check

@@ -168,6 +169,10 @@ export class FileService implements IFileService {
return this.raw.resolveContent(resource, options);
}

public resolveStringStream(resource: uri, options?: IResolveContentOptions, token?: CancellationToken): IStringStream {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this new method mean for the remote file system implementation? Can a remote file system provide the same stream, I have not seen any updates of this for RemoteFileService

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In parts it actually already works like that. Tho, the 'old' stream function does do everything in once call. My plan is to nuke the old once this is done and then migrate the remote work. Makes sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

}

//#endregion

public resolveStreamContent(resource: uri, options?: IResolveContentOptions): TPromise<IStreamContent> {
Copy link
Member

@bpasero bpasero Nov 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is now not used anymore, I suggest to keep the interface and fold your work in resolveStringStream into this method (see also below comment in textFileService).

@@ -738,6 +738,18 @@ export class TestFileService implements IFileService {
});
}

resolveStringStream(resource: URI, options?: IResolveContentOptions): IStringStream {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to ensure that the new code is covered by the same test suite that covers resolveContent today. There are many tests in fileService.test.ts that cover various things (e.g. large file, not modified since, encoding).

@bpasero bpasero merged commit f24ea58 into master Nov 14, 2017
@bpasero
Copy link
Member

bpasero commented Nov 14, 2017

@jrieken after a longer review and some minor changes around error handling, I merged this in. I also added a test specifically for the case of a file larger 64kb, just in case.

One thing I noticed is that we are not waiting for the fs.close() to finish when finishing. I am not 100% sure about the consequences to be fair. When I look e.g. at a comparable method readExactlyByFile the promise is wired up with the fs.close callback.

Another thing I noticed is that in case of an error within the resolveFileData method, we never seem to cancel the token at all from within the method. I wonder if there could be a case where an error happens but we are still calling into readChunk or handleChunk?

And then last but not least, I now notice that dev tools "break on error" directly after window reload more often when a file cannot be found, I end up in the reject case of the fs.open call. I guess that is a consequence of changing the code to be more promise based and dev tools insisting on breaking on rejected promises.

image

@jrieken
Copy link
Member Author

jrieken commented Nov 14, 2017

One thing I noticed is that we are not waiting for the fs.close() to finish when finishing

It's a bit of a trick. Why wait for for close when all we want is content. The risk is that we won't see any error that happens during close... We can either make the operation wait for close, log the error to the console, or ignore it as today 🙈

Another thing I noticed is that in case of an error within the resolveFileData method, we never seem to cancel the token at all from within the method.

It should do the same via calling finish. So abort (not call readChunk again), end the stream, and propagate errors (iff any).

@jrieken jrieken deleted the joh/stream branch November 14, 2017 09:30
@bpasero
Copy link
Member

bpasero commented Nov 14, 2017

@jrieken ok, I can add code to log the error at least and we see if it shows up after all. And yeah, looks like we never go into reading/chunking again after calling finish.

@bpasero
Copy link
Member

bpasero commented Nov 14, 2017

@jrieken I am wondering if fs.read() would ever return 0 for bytesread even if the file is still not fully read and simply because no data was read (e.g. network drives?). I am seeing suspicious retry code in stream.readExactlyByFile which could be from a long time ago when maybe node.js behaved like that.

Basically whenever we receive less bytes than we ask for, we stop reading. So far I could not figure out if this can ever happen, but maybe you came across this concern as well?

@jrieken
Copy link
Member Author

jrieken commented Nov 14, 2017

Didn't encounter it but fair concern. The ReadStream itself only checks that it got some data: https://github.com/nodejs/node/blob/803cacd228c273214072a6b9ca98d8c83d217c97/lib/fs.js#L2102

@jrieken
Copy link
Member Author

jrieken commented Nov 14, 2017

pushed 33acfb5 for that

@bpasero
Copy link
Member

bpasero commented Nov 15, 2017

@jrieken to confirm the theory I debugged what this code does:

fs.createReadStream('./test.txt').pipe(fs.createWriteStream('./copy.txt'));

It looks like the stream ends as soon as 0 bytes are read, but there is never a check for the number of bytes read matching the size of the buffer, so your fix seems right to me.

For the case of 0 bytes read, we end up pushing a null buffer into the stream here and then this seems to trigger onEofChunk.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants